20 research outputs found

    MMSE of probabilistic low-rank matrix estimation: Universality with respect to the output channel

    Full text link
    This paper considers probabilistic estimation of a low-rank matrix from non-linear element-wise measurements of its elements. We derive the corresponding approximate message passing (AMP) algorithm and its state evolution. Relying on non-rigorous but standard assumptions motivated by statistical physics, we characterize the minimum mean squared error (MMSE) achievable information theoretically and with the AMP algorithm. Unlike in related problems of linear estimation, in the present setting the MMSE depends on the output channel only trough a single parameter - its Fisher information. We illustrate this striking finding by analysis of submatrix localization, and of detection of communities hidden in a dense stochastic block model. For this example we locate the computational and statistical boundaries that are not equal for rank larger than four.Comment: 10 pages, Allerton Conference on Communication, Control, and Computing 201

    Phase Transitions in Sparse PCA

    Full text link
    We study optimal estimation for sparse principal component analysis when the number of non-zero elements is small but on the same order as the dimension of the data. We employ approximate message passing (AMP) algorithm and its state evolution to analyze what is the information theoretically minimal mean-squared error and the one achieved by AMP in the limit of large sizes. For a special case of rank one and large enough density of non-zeros Deshpande and Montanari [1] proved that AMP is asymptotically optimal. We show that both for low density and for large rank the problem undergoes a series of phase transitions suggesting existence of a region of parameters where estimation is information theoretically possible, but AMP (and presumably every other polynomial algorithm) fails. The analysis of the large rank limit is particularly instructive.Comment: 6 pages, 3 figure

    Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula

    Full text link
    Factorizing low-rank matrices has many applications in machine learning and statistics. For probabilistic models in the Bayes optimal setting, a general expression for the mutual information has been proposed using heuristic statistical physics computations, and proven in few specific cases. Here, we show how to rigorously prove the conjectured formula for the symmetric rank-one case. This allows to express the minimal mean-square-error and to characterize the detectability phase transitions in a large set of estimation problems ranging from community detection to sparse PCA. We also show that for a large set of parameters, an iterative algorithm called approximate message-passing is Bayes optimal. There exists, however, a gap between what currently known polynomial algorithms can do and what is expected information theoretically. Additionally, the proof technique has an interest of its own and exploits three essential ingredients: the interpolation method introduced in statistical physics by Guerra, the analysis of the approximate message-passing algorithm and the theory of spatial coupling and threshold saturation in coding. Our approach is generic and applicable to other open problems in statistical estimation where heuristic statistical physics predictions are available

    Factorisation matricielle et tensorielle par une approche issue de la physique statistique

    No full text
    In this thesis we present the result on low rank matrix and tensor factorization. Matrices being such an ubiquitous mathematical object a lot of machine learning can be mapped to a low-rank matrix factorization problem. It is for example one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. The result presented in this thesis have been included in previous work [LKZ 201].The problem of low rank matrix becomes harder once one adds constraint to the problem like for instance the positivity of one of the factor of the factorization. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a paralel with the study of vector-spin glass models -- presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail ageneral form of the low-rank approximate message passing (Low-RAMP) algorithm that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods.Dans cette thèse, je présente des résultats sur la factorisation de matrice et de tenseur. Les matrices étant un objet omniprésent en mathématique, un grand nombre de problèmes d'apprentissage machine peuvent être transcrits en un problème de factorisation de matrice de petit rang. C'est une des méthodes les plus basiques utilisée dans les méthodes d'apprentissage non supervisé et les problèmes de réduction dimensionnelle. Les résultats présentés dans cette thèse ont pour la plupart déjà été inclus dans des publications antérieures [LKZ 2015]. Le problème de la factorisation de matrice de petit rang devient de plus en plus difficile quand on rajoute des contraintes additionnelles, comme par exemple la positivité d'un des facteurs. Nous présentons ici un cadre dans lequel analyser ce problème sous un angle Bayésien où les priors sur les facteurs peuvent être génériques et où l'output channel à travers duquel la matrice est observée peut être générique aussi. Nous tracerons un parallèle entre le problème de factorisation matricielle et les problèmes de verre de spin vectoriel. Ce cadre permet d'aborder d'une façon unifiée des problèmes qui étaient abordés de façon séparée dans des publications précédentes. Nous dérivons en détail la forme générale des équations de Low-rank Approximate Message Passing (Low-RAMP) ce qui donnera un algorithme de factorisation. Ces équations sont connues en physique statistique sous le nom des équations TAP. Nous dérivons ces équations dans différents cas, pour le modèle de Sherrington-Kirkpatrick, les restricted Boltzmann machine, le modèle de Hopfield ou encore le modèle xy. La dynamique des équations Low-RAMP peut être analysée en utilisant les équations de State Evolution; ces équations sont équivalentes à un calcul des répliques symétriques. Dans la section dévolue aux résultats nous étudierons de nombreux diagrammes de phase et transition de phase dans le cas Bayes-optimale. Nous présenterons différentes typologies de diagrammes de phase et leurs interprétations en terme de performances algorithmiques

    Matricial and tensorial factorisation using tools coming from statistical physics

    No full text
    Dans cette thèse, je présente des résultats sur la factorisation de matrice et de tenseur. Les matrices étant un objet omniprésent en mathématique, un grand nombre de problèmes d'apprentissage machine peuvent être transcrits en un problème de factorisation de matrice de petit rang. C'est une des méthodes les plus basiques utilisée dans les méthodes d'apprentissage non supervisé et les problèmes de réduction dimensionnelle. Les résultats présentés dans cette thèse ont pour la plupart déjà été inclus dans des publications antérieures [LKZ 2015]. Le problème de la factorisation de matrice de petit rang devient de plus en plus difficile quand on rajoute des contraintes additionnelles, comme par exemple la positivité d'un des facteurs. Nous présentons ici un cadre dans lequel analyser ce problème sous un angle Bayésien où les priors sur les facteurs peuvent être génériques et où l'output channel à travers duquel la matrice est observée peut être générique aussi. Nous tracerons un parallèle entre le problème de factorisation matricielle et les problèmes de verre de spin vectoriel. Ce cadre permet d'aborder d'une façon unifiée des problèmes qui étaient abordés de façon séparée dans des publications précédentes. Nous dérivons en détail la forme générale des équations de Low-rank Approximate Message Passing (Low-RAMP) ce qui donnera un algorithme de factorisation. Ces équations sont connues en physique statistique sous le nom des équations TAP. Nous dérivons ces équations dans différents cas, pour le modèle de Sherrington-Kirkpatrick, les restricted Boltzmann machine, le modèle de Hopfield ou encore le modèle xy. La dynamique des équations Low-RAMP peut être analysée en utilisant les équations de State Evolution; ces équations sont équivalentes à un calcul des répliques symétriques. Dans la section dévolue aux résultats nous étudierons de nombreux diagrammes de phase et transition de phase dans le cas Bayes-optimale. Nous présenterons différentes typologies de diagrammes de phase et leurs interprétations en terme de performances algorithmiques.In this thesis we present the result on low rank matrix and tensor factorization. Matrices being such an ubiquitous mathematical object a lot of machine learning can be mapped to a low-rank matrix factorization problem. It is for example one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. The result presented in this thesis have been included in previous work [LKZ 201].The problem of low rank matrix becomes harder once one adds constraint to the problem like for instance the positivity of one of the factor of the factorization. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a paralel with the study of vector-spin glass models -- presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail ageneral form of the low-rank approximate message passing (Low-RAMP) algorithm that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods

    Constrained Low-rank Matrix Estimation: Phase Transitions, Approximate Message Passing and Applications

    No full text
    64 pages, 12 figuresThis article is an extended version of previous work of the authors [40, 41] on low-rank matrix estimation in the presence of constraints on the factors into which the matrix is factorized. Low-rank matrix factorization is one of the basic methods used in data analysis for unsupervised learning of relevant features and other types of dimensionality reduction. We present a framework to study the constrained low-rank matrix estimation for a general prior on the factors, and a general output channel through which the matrix is observed. We draw a paralel with the study of vector-spin glass models - presenting a unifying way to study a number of problems considered previously in separate statistical physics works. We present a number of applications for the problem in data analysis. We derive in detail a general form of the low-rank approximate message passing (Low- RAMP) algorithm, that is known in statistical physics as the TAP equations. We thus unify the derivation of the TAP equations for models as different as the Sherrington-Kirkpatrick model, the restricted Boltzmann machine, the Hopfield model or vector (xy, Heisenberg and other) spin glasses. The state evolution of the Low-RAMP algorithm is also derived, and is equivalent to the replica symmetric solution for the large class of vector-spin glass models. In the section devoted to result we study in detail phase diagrams and phase transitions for the Bayes-optimal inference in low-rank matrix estimation. We present a typology of phase transitions and their relation to performance of algorithms such as the Low-RAMP or commonly used spectral methods

    Statistical and computational phase transitions in spiked tensor estimation

    No full text
    8 pages, 3 figures, 1 tableWe consider tensor factorizations using a generative model and a Bayesian approach. We compute rigorously the mutual information, the Minimal Mean Square Error (MMSE), and unveil information-theoretic phase transitions. In addition, we study the performance of Approximate Message Passing (AMP) and show that it achieves the MMSE for a large set of parameters, and that factorization is algorithmically "easy" in a much wider region than previously believed. It exists, however, a "hard" region where AMP fails to reach the MMSE and we conjecture that no polynomial algorithm will improve on AMP

    Phase transitions and optimal algorithms in high-dimensional Gaussian mixture clustering

    No full text
    8 pages, 3 figures, conferenceWe consider the problem of Gaussian mixture clustering in the high-dimensional limit where the data consists of mm points in nn dimensions, n,mn,m \rightarrow \infty and α=m/n\alpha = m/n stays finite. Using exact but non-rigorous methods from statistical physics, we determine the critical value of α\alpha and the distance between the clusters at which it becomes information-theoretically possible to reconstruct the membership into clusters better than chance. We also determine the accuracy achievable by the Bayes-optimal estimation algorithm. In particular, we find that when the number of clusters is sufficiently large, r>4+2αr > 4 + 2 \sqrt{\alpha}, there is a gap between the threshold for information-theoretically optimal performance and the threshold at which known algorithms succeed
    corecore